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Abstract 

Modularity Q is an important function for identifying community structure in com- 
plex networks. In this paper, we prove that the modularity maximization problem 
maxsgs Q = Tr{S^ BS) is equivalent to a nonconvex quadratic programming problem 
maxsgs Qm = Tr{S^ {B + D)"^ S) . This result provide us a simple way to improve the 
efficiency of heuristic algorithms for maximizing modularity Q. Many numerical results 
demonstrate that it is very effective. 
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1 Introduction 

Complex network has received an enormous amount of attention in recent years [H [H [3] . 
Scientists have become interested in the study of networks describing the topologies of wide 
variety of systems such as the world wide web, social and communication networks, bio- 
chemical networks and many more. Based on complex networks many quantitative methods 
can be applied so as to extract the characteristics embedded in the system. One of the 
important quantitative methods is to analysis the community structure [2 [5J . Distinct 
communities within networks can loosely be defined as subsets of nodes which are more 
densely linked, when compared to the rest of the network. Nodes belonging to a tight-knit 
community are more than likely to have other properties in common. In the world wide 
web, community analysis has uncovered thematic clusters. In biochemical or neural net- 
works, communities may be functional groups [2 HI [S], and separating the network into 
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such groups could simplify the functional analysis considerably. As a result, the problem of 
identification of communities has been the focus of many recent efforts. 

Maximizing modularity Q is the most widely accepted method for detecting community 
structure among many algorithms [ISlHIHlliliniiniinillllllllilllSlIIllIHllISl 

[5TJ [22] , although modularity index has been proved that it may fail to identify small 
modules [23j. Modularity Q was presented as a index of community structure by Newman 
and Grive, which was introduced as Q — J2r i^rr ~ o-r)^ where are the fraction of links 
that connect two nodes inside the community r, the fraction of links that have on or 
both vertices in side the community r, and sum extends to all communities r in a given 
network. Note that this index provides a way to determine if a certain description of the 
graph in terms of communities is more or less accurate. Generally speaking, the larger the 
value of Q, the more accurate is a partition into communities. So maximizing modularity 
Q can detect community structures. There are many algorithms of maximizing Q directly 
such as extremal optimization (EO) greedy algorithm ^ and other optimal algorithm. 
In fact, they are usually heuristic algorithms for modularity maximization problem and this 
problem has been proved to be a NPC in the strong sense by Ulrik Brandes et al [24] . 

Can we improve the efficiency of corresponding heuristic algorithms by detailed investiga- 
tion of mathematic structure of modularity Q? According to ref ^12J, maxQ = (^^^ — o,r) 
can be simplified as maxQ = Tr{S'^BS). In this paper, we proved that maxQ = Tr{S^BS) 
is a nonconvex quadratic 0—1 programming. Assume D — diag{^^^^ \BiAiYlll=i l^2,i|, • • • , \Bn,i\), 
where Bi ^ is the element of B. Then maxQ = Tr{S^BS) is equivalent to maxQ^ = 
Tr{S'^{B + D)™S) for all positive integer number rn. [B + _D)™ is a positive matrix, so the 
modularity maximization problem can map to a continuous nonconvex quadratic program- 
ming. These theorems will be detailed in Section 2. In this way, modularity maximization 
problem is equivalent to maxQm = Tr{S^{B + D)™^). We have done many numerical 
experiments on artificial and real-world networks such as physics-economics scientists coop- 
eration network, E.coli network and Collage football network, and found that a proper large 
m is very helpful for two basic neighborhood transformation algorithms and EO algorithm 
for maximizing Q. It implies that our results has great possibility to enhance the efficiency 
of many heuristic algorithms. 

2 Theorems about modularity maximization problem 

Newman and Givan proposed the modularity Q index based on the common experience that 
such networks seem to have communities in them: subsets of nodes within which node-node 
connections are dense, but between which connections are less dense [5]. According to |12j . 
modularity Q can be simplified. Suppose we have a network N which has n nodes and can 
be represented mathematically by an adjacency matrix A with elements Aij = 1 if there 
is an edge from i to j and Aij = otherwise, di denotes the degree of node i and P is 
a matrix, Pij — -^jj-- Without losing any generality we assume that the network N has 
n communities (if the number of community is less than n we can use to substitute). 
Suppose S — {Si, S2, • • • , Sn) is the community structure matrix. Si G {0, 1}" denotes the 
i community, i — 1,2, •••,71. For example: assume Si = (0, 1,0, 1,0,- -^O)^, it denotes 
that community i only contains two nodes which are node 2 and 4. Because a node only 
belongs to one community, each row of S just has one 1. We use § to denotes the set of all 
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possible S. Let B = A — P, we easily have modularity maximization problem is equivalent 
to maxses Q = Tr{S^ BS) |12| . where Tr means trace which denotes the sum of diagonal 
entries of a matrix. 

Now we will map the maximization modularity Q problem to nonconvex quadratic 0-1 
( S,\ 



programming. Let 5' = 
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From the subject conditions we can easily get that the set § contain n" elements, S = 
{S*^, S"^, • • • , 5" }. According to the definition of S we also have the corresponding set 
S = {51, 5*2, •••,5""}. 

Theorem 1: Let D = diag{J2i^i l-Bi.iL Z]"=i 1^2, ■ • ■ , l^",*!)' then, max^gg Q = 

Tr{S'^BS) problem is equivalent to the maximization problem of maxs^g Qi — Tr{S'^{B + 
D)S) which can be map to a nonconvex quadratic continuous programming. 

Proof: 

■.■ Tr{S^{B + D)S) = Tr{S^BS) + Tr{S^DS) = Tr{S^BS) + ^-Li 

.■. max5gs Q = Tr{S'^ BS) problem is equivalent to the maximization problem of 

maxses Q = Tr{S^{B + D)S). 

According to Gerschgorin Circle Theory |25| . easily we have _B + -D is a symmetrical 
positive matrix. 

maxggs Q — Tr{S'^{B + D)S) is a continuous nonconvex quadratic programming pS] . 

Theorem 2: For all positive integer number m, maxggg Q = Tr{S^BS) problem is 
equivalent to the maximization problem of maxggg Qm = Tr{S^{B + D)"^S). 

Proof: ■.■ maxsgs Q = Tr{S'^BS) 

is equivalent to max^gg Qi — Tr{S^{B + D)S) 

( B + D \ 
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and S'^S = Tr{S'^S) = n 
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3 Application of the theorems 

Based on the theorem 2, maximizing Q is equivalent to maxggs Qm = Tr{S'^{B + D)"'S). 
Can we enhance the efficiency of heuristic algorithms for maximizing modularity Q by chang- 
ing it into this new maximizing problem with a proper large m? There are so many heuristic 
algorithm for maximizing modularity Q, we cannot investigate all of them. If we could, we 
also cannot promise our method satisfy the future heuristic algorithms. But it is well-know 
that, for many heuristic algorithms such as EO, Potts [22] and so on, their key methods are 
to find optimal neighborhood transformations, where neighborhood transformation means 
moving a node for one community to another community at each optimizing step. So if 
our method is effective on the basic neighborhood transformation algorithms, it will has 
great possibility to be effective on many other heuristic algorithms. There are two basic 
neighborhood transformation algorithms. One is random neighborhood transformation al- 
gorithm. We randomly initiate the beginning partition (with sufficient number of groups), 
then at each step, randomly choose a node form one community and move it into another 
one that can make Q become larger, until moving any node cannot make Q larger any more. 
The other algorithm is greedy neighborhood transformation algorithm. The corresponding 
process is similar with the process of random one, but the difference is that at each step, 
the node will be moved to a group that makes Q has the largest increment. We choose four 
different fields' networks to test our method. One is the classical artificial random network 
which has n = 128 nodes divided into 4 communities of 32 nodes each. Edges between 
two nodes are introduced with different probabilities depending on whether the two nodes 
belong to the same community or not: every node has < kintra = 8 > links on average to 
its fellows in the same community, and < kinter = 8 > links to the outer-world. Here we 
chose the artificial network with the diffuse community structures to test our method. It 
is because when the network contains clear community structure, m has almost no effects 
on the final partition. The rest 3 networks are scientists cooperation network [57], E.coli 
network [55j and college football network [S]. The results show that for a proper large m, 
our method is helpful for finding large value of Q (as shown in Fig. [T]and Fig. [2]). But it is 
hard to say it need more or less time in maximizing Q process. 
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We also use the extremal optimization algorithm (EO) [5T] to test our method. EO was 
proposed by Jordi Duch and Alex Arenas, which is heuristic algorithm. In their algorithm, 
they define a fitness of each node. The fitness fi of node i is defined as 

= f (1) 

where, ki denotes the degree of node i, and the qi is the contribution of individual node i 
to the Q. Assume denotes the n-dimensional vector in which the ith element is 1, others 
then 

q^ = efBS, (2) 
For the maximization problem max5gs Q = Tr[S'^ {B + D)™S), the contribution is 

qr = efiB + Dy^S. (3) 

Unfortunately, we cannot use the function /™ = ^ (as Eq. [T|) to define the fitness, for it is 
not satisfy the original conditions (see [3T]). So we define the new fitness function as the Eq. 
[3l Moreover, Jordi Duch and Alex Arenas didn't define the 'optimal state' quantitatively in 
[21j . In this paper, we think a partition process has arrived the optimal state at step t if the 
Q of t is equal or larger than each Q from step t + I to t + n, where n is the node number 
of a network. 

We investigate extremal optimization with new fitness function (NEO) for different m 
and compare the NEO algorithm with the EO algorithm in the above four networks. The 
results show that the proper larger m is very helpful both for maximizing modularity Q and 
reducing computing time, but sometimes the too large m is not helpful (as shown in Fig. 
[3]). We guess one of the main reasons is that too large m will bring more computing errors. 

4 Conclusion and discussion 

We prove that the modularity maximization problem is equivalent to a nonconvex quadratic 
programming problem. Based the characteristics of nonconvex quadratic programming, we 
demonstrate that the modularity maximization problem is equivalent to the maximization 
problem maxsgs Qm = Tr{S'^{B+D)"^S). This conclusion provide a simple way to improve 
the efficiency of algorithms for maximizing modularity Q. Many numerical experiments are 
done in different networks include artificial networks, scientists cooperation network, E.coli 
network and Collage football network. The results show that new maximization problem 
with proper large m can enhance the efficiency of the heuristic algorithms for maximizing 
Q. Especially, it is helpful in both maximization Q and time complicity for EO algorithm. 
But it is a real challenge problem to strictly give the most optimal m. 
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Figure 1: Results of random neighborhood transformation algorithm 
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Figure 3: These plots show the results of NEO and EO algorithm in different networks. 
The line means the Q value which is got by EO (with original fitness function) . From these 
plots we can conclude that large m is very helpful in both maximizing Q and reducing time 
complicity. But sometimes too large m will bring overdone effects such as the results in the 
collage football network. 
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